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The Next Generation of Contem Addressable Memories 



The Challenging Market Environment for CAMs 

The explosive growth in the use of Intranets and the Internet are taxing LANs to the limit 

- OrerairtRmdwidtlrnwxJs ^:growtog-^I^aa5^ Apfrtical^ 

Fast-path L3 routing are now typically implemented in hardware. New requirements, 
including flow analysis, policy based routing and end-to-end QoS, are rapidly incre^ing 
the variety and quantity of lookups required. Every packet must now be examined 
multiple times and large hardware bridge router implementations require a forwarding 
decision to be made every few nanoseconds. 

Table sizes axe also growing. Even though 
Classless Inter Domain Routing (CIDR) has 
managed to slow the growth of intemet router 
table sizes, flow analysis and routing based on 
flow ID are all creating the need for deeper 
and wider table sizes in bridges and routers. 
Currently, table sizes for Intemet routing can 
be as large as 55,000 entries and this is 
expected to grow at 5,000 entries per year for 
the foreseeable future. Flow ID table sizes are 
ultimately limited by the cost of 
implementation. 

Table searching is one of the most time- 
consuming operations in a router. This 
bottleneck typically limits the forwarding 
capability of current routers. Traditionally, 
Network System Architects have been forced 
to address these problems using hashing or 
tree searching algorithms, which are 
acceptable for moderate performance 
applications. First generation CAMs did not 
achieve wide spread adoption for bridging and 
routing applications because they were slow, 
expensive, small, and most importantly did 
not support longest match applications. A new 
generation of CAMs now addresses these 
issues. 



The new generation of CAMs 

Now, technology is enabling commercially viable, adequately-sized CAMs, hicreasingly, . 
system designers are making the transition from familiar but lower performance solutions 
to this new breed of CAMs. High performance CAMs, capable of tens of millions of 
searches per second and caiscadable to create table depths of over 500K entries, are now 



What is a CAM? 

A Content Addressable Memory is a device 
designed to accelerate any application that 
requires extremely fast searches of list based 
data. To best understand what a CAM does, 
it helps to contrast it to conventional random 
access memory. Data is stored in memory in 
specific locations called addresses. When 
there is a need to retrieve the data, an 
address is supplied to the memory, which in 
tum returns the data. 

In a CAM, the opposite occurs. Data is 
supplied to the memory via a special 
comparand register and the memory returns 
an address if a corresponding match is 
found. This enables extremely quick 
searches. The entire CAM is searched in a 
single clock cycle and if a match is found, 
the address retiimed is used to retrieve data 
associated with the search string. The 
associated data is typically stored in a 
separate, discrete memory in a location 
specified by the result of the CAM search. 
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becoming available. 

Feature rich, the leading edge CAMs provide functionality including automatic aging, 
multiple search mask registers, glueless depth expansion, one cycle auto-learning and 
multiple match priority resolution. These features are all essential for the new generation 
nofswitch^r^TOTtei^em^^ CiQ5ls,able to stofeO;!" W'^nl cafe**, address ffie 
needs of emerging applications such as CIDR, flow analysis, advanced VLAN support 
and L4 awareness. Per bit temary CAMs enable longest match searches, in a single cycle, 
required for high performance routing. 

Not limited to routing, CAMs can be used in many application areas including packet 
classification, compression, cryptography, pattern recognition, and parallel data 
processing. Larger, faster CAMs ofTer many advantages as the need for increasing 
bandwidth continues. 



PoRcy and Flow-based Applications 

The various ways in which the Internet is being used, combined with the ever increasing 
need for bandwidth I^as created the requirement to assign levels of priorities to different 
applications. Intending to address this requirement, RS VP and differentiated Services 
have been developed to allow network providers to. give preferential treatment to certain 
types of traffic. Differentiated Services further iinplies that traffic types can be distributed , 
into separate classes that can then be given specific priority as they travel through the 
network. 

QoS depends fundamentally on the ability to identify, classify and mark network traffic. . 
Withotit these basic biiilding blocks, attempts to provide this kind of service will likely 
not provide the desired behaviour in the network; . • 

QoS is currently being delivered in two basic ways. Policy-based routing is concerned 
with generic classification of the packet contents for service level determination. Flow- . . . 
based routing assigns a se^ce level to uniquely identified dialogues (or "Flows*'). 
Policy-based routing cal^Sssigri QoS based on die **Type of Service** TOS bits as defined 
by differentiated Servi(ies, or add hoc methods (i.e. classifying the application port 
numbers, FTP Vs. telnet Vs. http). 

Flow ID is intended to identify the end-to-end application stream, of which any given 
packet is a member. Once a packet can be assigned to a flow, it can be forwarded with an 
associated QoS. A flow can be imiquely identified by a 4-tvplc consisting of source IP 
address, source TCP or UDP port, destination IP address, and destination TCP or UDP 
port as shown in the table below. 



Source IP 
Address 


Source 
TCP/UDP Port 


Destination IP 
Address 


Destination 
TCPAJDP Port 


32 bits 


16 bits 


32 bits 


16 bits 
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This represents a minimiim of 96 bits to uniquely identify a flow in IPv4. Adding 
information such as previous hop fbrther increases the amount or width of data to be 
processed. IPv6 increases the width considerably with the proposed 128 bit source and 
destmation addresses. Future extension headm such as hop-by-hop options header and 
ix>ut]ng.header-:wiUalsaneed4o4)e^n>cessed-by~ro\]tm-alG 

Resource ReSerVation Setup Protocol or RSVP is an IP based protocol used for 
communicating application QoS requirements to intermediate transit nodes in a network. 
RSVP uses a soft-state mechanism to mamtain path and Reservation State in each node in 
the reservation path. Per flow differentiation allows defiidtion of similar QoS 
characteristics to particular IP end-to-end sessions. 

One major architectural issue in networics that maintain flow-state information centers 
around the computational and system resource inq)act on maintaining state information 
on the potentially thousands of flows established across a diverse network. This is further 
exacerbated when considering maiataining and manipulating flow information for up to 
256,000 active flows in the core of the Internet at any given time. Content addressable 
memories with the appropriate architecture are ideally suited to address this concern. 

CAMS and Network Processors 

For some time, centralized software solutions for classification and forwarding lookups 
have limited router performance. In current architectures, the lack of suitable CAMs has 
driven the development of multi-probe hash lookup schemes and complex tree- walking 
architectures to achieve the necessary lookup performance. These solutions did not have 
enough flexibility to address the current needs for QoS features supported by Flow and 
Policy based routing. Today, there is a lot of interest in the concept of dedicated 
processors, called Network Processors (NP). to manage packet classification, admission 
control and forwarding decisions in a de-centralized fashion. 

The Netwoik Processor would use a simplified instruction set and very high speed 
operation to parse the packet and accelerate table lookup schemes in dedicated memory 
on a per port, or small aggregate of ports, basis. The NP provides flexibility, but can be 
somewhat non-deterministic and requires distributing software and databases throughout 
the system. This tan increase system cost 

Ternary CAMs also provide flexibility, allowing the OEM to implement packet 
classification, QoS determination, and make the forwarding decision in a series of 
pipelined lookups. This can be accomplished at a lower overall cost, even on a per-port 
basis. More importantly, the performance of MOSAID's CAMs support much higher 
throughput allowing more ports to be aggregated together and providing a solution that 
scales to meet future router needs. 
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DRAM CAMS 

Binary CAMs, capable of storing large tables are now in production and are widely 
--^vaHable^o-System-Ar<:Wteets;"^niese€AMs-are-weU^ 
New applications such a s CIDR, flow analv^js ^a^ygp^^ ^V^^A , ^ ^ ^ s imnnrt and LA _ 
awareness are all creating a need to store **^6S^ are*' addition to ones and zeros. 
Ternary CAMs, capable of storing 0,1 ,X in U "^lllgUir^S^n/are required to provide the 
benefits of CAMs to these applications. 

Binary SRAM CAMs are typically implemented using a 10 transistor design. Early 
attempts at storing temary data involved 2 lOT SRAM cells and sofbvare machinations. 
Even now, temary capability in SRAM requires a 16 transistor CAM cell. 




WL 



Figure 1 : 16T Temary SRAM Cell 

A DRAM CAM enables a denser CAM with a much smaller cell size than competing . 
SRAM based solutions. DRAM CAMs can be designed to be as large or larger than any 
commercially available monolithic temary CAM on the maAet today, DRAM CAM cells 
can be implemented in just 6 transistors, representing a 2.5 times density advantage of 
competing SRAM technology. 
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Figure 2: MOSAID 6T Ternary DRAM CeU^ 
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As DRAM CAMs cells are inherently ternary, each 6 transistor cell is capable of storing 
0,1,X with no additional overhead. The end result is a lower power, denser and more cost 
effective approach to CAM implraientation. Fixed oveAead for the interface, registers 
and multiple match resolver enable DRAM CAMs to scale to increasmgly wide word 
- JmplCTientationsAvith^only-rninimal-die-sizepena^^ 
extremely wide CAMs is critical for emerging applications such as flow analysis RS VP ' 
and IPv6, 

A reality of any DRAM-based system is the need for refresk Designers are typically 
familiar with DRAM and have experience dealing with this requh-ement. Most 
applications tend to require searches in bursts, leaving free cycles in which refresh 
commands can easily be scheduled Should system designers wish to avoid refresh 
scheduling altogether, MOSAJD CAMs also provide an auto-refresh mode of operation. 
MOSAip DRAM CAMs offer performance that equals or exceeds any currently 
available SRAM sohition. As with many fast, synchronous systems, MOSAID CAMs are 
pipelined. A pipelined architecture allows optimized read and write throu^put. 
MOSAID has applied this in our CAMs to provide write performance equaling search 
speeds. This pipelined architecture allows cascading of multiple CAMs for deeper table 
sizes with no degradation in performance and only one additional cycle latency on the 
first search. 

DRAM CAMs deliver highly scaleable, high performance solutions and full ternary 
capability. DRAM CAMs provide the ultimate flexibility for evolving applications like 
flow ID and classification, sorting by Layer 2, 3 and 4 address fimctions. 



Highfights of the MOSAID Class-IC CAM 
Per Bit Ternary 

The inherent ternary capabiUty featured in MOSAlD's DRAM-based CAM cell and the 
multiple match resolver in the Class-IC combine to provide the highest flexibility and 
siq>port the broadest range of CAM applications. . 

Of course, router tables requking longest-match, smgle cycle, searches can be directly 
supported in the Class-IC CAM, but the per-bit ternary nature of the Class-IC CAM does 
not limit implementation to a specific application (e.g. CIDR). The per-bit temary is • 
quite important for the ^coding techniques required to achieve single-search packet 
classification, but also supports default cases for advanced VLAN support, tailored flood 
vectors for forwarding, partial matches on Flow IDs, and otiier iimovative applications. 
The multiple masks and partitioning capability support mTiltq)le search scenarios on a 
single database and miiltiple databases on the same device. 

System-level Features 

Associated with each CAM entry are a number of "special bits." These bits are used to 
encode the type and validity of the entry. In the first generation Class-IC device, bits for 
Empty, Skip, Permanent and Age have been provided. Empty status is an obvious 
requirement for updating the table. Skip is important for managing pre-allocated, but 
empty locations in the CAM and also allows the user to "walk through" a series of 
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multiple matches. Age is a single bit age indication, iq>dated whenever an entry is 
referred to, provided so that "stale" CAM entries can be purged after a desired interval. 
Permanent protects an entry against purging due to age. The MOSAID Class-lC CAM 
can purge all the "old" entries in two clock cycles, minimizing the impact on system 
— performanGe-(-&F9u^put)T 

Learning & Aging support is included in the Class-lC CAM primarily to support Layer-2 
Bridging applications in switches. It is also key to the new emerging Flow-based routers, 
where there may be a need to automatically leam Flows and age them out as welL In 
Automatic Learning, the user can specify that different masks be used for the search and 
for the leam operation so that ternary fields can be automatically encoded into the leamed 
data. The Class-lC supports single-cycle learns, sustaining fiiU system throughput 

If Learning & Aging siq)port is not required of the particular application, the "special 
bits" can be used to partition the CAM into different segments supporting multiple 
databases without wasting CAM entry bits. Any of the Skip, Permanent and Age bits can. 
be recovered, providipg-up to eight different partitions. 

Interface Considerations. 

An important part of a CAM product offering, especially as CAM word widths get wider, 
is the physical interface offered by the chip. Minimizing the pin-count that is required of 
an OEM ASIC is very important. 

The SDQ (Search Data) interface on Class-lC is bi-directional, so that the ASIC designer 
can perform both read and write operations with a single port 

A traditional synchronous interface is available for word widths up to 72-bits in 
MOSAlD's first Class-lC CAM. The CAM also features a Dual Data Rate (DDR) 
interface, commonly available in today's advanced DRAMs. DDR allows the user to 
clock data in on both edges of the synchronous clock, allowing a 66 MHz search rate to 
be maintained even when the I/O width is less than the data width. This allows the ASIC 
designer to woric with only 36 SDQ pins for a72-bit datawidth or support a 144-bit 
datawid^ with only 72 SDQ pins. 

Another important concern in minimizing the pin count is to ensure that the OEM can 
access and manipulate the associated data SRAM without requiring an additional 
interface on their ASIC. In the Class-IC device this is achieved by providing a 
specialized instruction that will directly pass an SRAM address from the SDQ pins of the . 
CAM and out onto the MA bus to the SRAM address pins. 

Looking Forward - Class*IC Roadmap 

DRAM CAM technology offers a higher bit density than is possible using SRAM based 
solutions. This allows fully ternary CAMs of up to SMbits to be quickly developed in 
existing technology with even larger CAMs becoming available as new fabrication 
processes become available. Wider CAMs can also easily be implemented in DRAM. As 
CAMs typically have fixed overhead for logic outside the CAM array, wider DRAM 
CAMs deliver even higher silicon efKciency. 
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Speed grades can also be easily increased in current technologies. DRAM CAM arrays 
and associated word-line drivers can be architected to provide significantly higher 
performance with only a minor reduction in density and increased power consumptioiL 
DRAM CAMs can approach the random read/write performance of SRAM solutions and 
-aGhiev^hi^er-sear^^roughpu^4^MH^^(4iile^ntinuin^OT^ — 
size advantage over SRAM. 

Innovative techniques such as the SLDRAM 533 MHz bus interface could be used to 
dramatically reduce the interface pin count to the OEM ASIC. Other advanced techniques 
coxild be used to further improve propagation delays and setup times at the int^ace. 

Specialized features can also be easily developed using DRAM CAM technology. 
Although a DRAM CAM cell is inherently temary, simplifying the interface for 
communicating the temary mask through enumo^ting the number of significant bits 
could be particularly useful for specific applications like CIDR. While such a device 
would be 1^ flexible, it would be optimized to the application. 

The pipelined synchronous architecture of the current MOSAID Class-lC device has 
enabled very high throughput for the current and future generations of OEM systems. 
The pipelined mode and high-speed synchronous interfaces with DLL clock schemes will 
allow the MOSAID Class-lC product line to keep pace with the multi-Gigabit and Terabit 
router/switch architectures currently on the drawing board. The pipelined architecture . 
ensures no performance degradation with multiple Class-lC devices cascaded together. 
Pipelining can also allow seamless interface to SSRAM for associated data, and "early" 
status flags to enable conditional processmg, all without any penalty to sustained 
throughput 

Looking forward, the system-level features of the MOSAID Class-lC family will 
continue to evolve to meet requirements created by emerging applications. Future 
releases will include features and functionality in such areas as simultaneous multiple 
word width support, 'Validity" bits on ou^ut, highly flexible partitioning, greater age 
granularity and special features for "flow" aging (including limited purges), automatic 
sorting of entries for longest match siq>port, and internal associated data RAM. 

For over 25 years, MOSAID Semiconductor has been at the forefront of DRAM design. 
MOSAID Semiconductor has been instrumental in the development of the last 9 
generations of commodity DRAM and has developed high performance, specialized 
DRAM interfeces mcluding DDR and SLIO. MOSAID is committed to applying this 
experience to provide industry leading CAM technology to alleviate bottlenecks and 
accelerate networking applications. 
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